Skip to content

Conversation

@cometkim
Copy link
Owner

@cometkim cometkim commented Jul 30, 2025

Can Claude Code optimize? Let's see

Claude suggested several improvements that wouldn't break the test.

Verifying the claims

While the claims sound reasonable, theory and reality can differ, so I broke down the suggestions and verified them separately.

Claim 1. Binary search operation

Problem: Used signed right shift (>>) in binary search
Solution: Changed to unsigned right shift (>>>) for better performance
Impact: Minor but consistent improvement in Unicode range lookups

Experimeted in #79

Result: False (but helpful suggestion)

It makes no difference even though it is the most hit code.

>>> and >> shouldn't differ; the codebase already carefully handles the integers. However, I accepted the change because it makes sense to rewrite the code a bit more compactly.

Claim 2. String build optimization

Problem: Character-by-character string concatenation using segment += input[cursor++]
Solution: Replace with input.slice(segmentStart, cursor) to avoid repeated string allocations
Impact: Significant reduction in memory allocations and GC pressure

Experimented in #80

Result: True

The suggestion perfectly makes sense, but not every cases.

I've confirmed that string concatenation does indeed incur overhead, but it's only a problem when the segment is larger than a single character. In the typical case (the alphabet), it's likely to be a single character.

// case 1. BMP. There is no string concat.
segment = input[cursor];

// case 2. >= SMP
segment += input[cursor++];

However, with larger segment sizes, larger perf improvement. In extreme cases, such as with Demonic characters, perf gain is over 100%.

Claim 3. Inlining

Problem: All characters went through generic cat() function with binary search
Solution: Inline category detection for ASCII characters (< 127) directly in the main loop
Impact: ~90% of characters in typical text get faster processing

Experimented in #81

Result: False

The suggestion didn't really help.

It makes code bloat, but there was no impact on performance.

Claim 4. Prioritizing common cases

Problem: Boundary rules were checked in specification order, not frequency order
Solution: Reordered isBoundary() checks to handle most common cases first:

  • GB9/GB9a (extend rules) moved to top as they're the most frequent "no break" cases
  • GB3 (CR x LF) must come before GB4/GB5 to handle correctly
  • Simplified Hangul rules for better performance
    Impact: Faster short-circuiting for common character sequences

Experimented in #82

Result: True? (not 100% sure)

The claim is valid. Prioritizing the most common cases is a legit strategy. But assuming something is the most common can often be inaccurate or dangerous.

The suggested change made the code less intuitive but had no impact on the performance. However, it gave a hint for eliminating potentially unnecessary branches.

Lessons on using LLM agents

  • LLM could do basic optimization, but it doesn't help much on an already optimized path.
  • Suggestions may be incorrect or be based on insufficient evidence.
  • Because no benchmark is perfect, you should be cautious about making major changes based on it.
  • It's still your responsibility to interpret and verify the details.
  • They rarely give helpful hints. It's up to you.

@changeset-bot

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants